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Abstract 

A scheme for recognizing 3D objects fromsingle 2D inages is introduced. The schem proceeds in two 
stages. In the first stage, the categonzati on stage, the inage is conpared to prototype objects. For 
each prototype, the viewthat rost resenbles the inage is recovered, and, if the viewis found to be 
sirmlar to the inage, the class identity of the object is deterrmned. Inthe secondstage, the identification 
stage, the observed obj ect is conpared to the individual rodels of its class, where classes are expectedto 
contain obj ects w th rel atively sirmlar shapes. lor each rodel , a viewthat natches the inage is sought. 
If such a viewis found, the object's specific identity is deterrmned. The advantage of categorizing the 
object before it is identified is twofold. First, the inage is conpared to a snaller nunber of rodels, 
since only rodels that belong to the object's class need to be considered. Second, the cost of conparing 
the inage to each rodel in a class is very low, because correspondence is conputed once for the whole 
class. More specifically, the correspondence and obj ect pose conputed in the categorization stage to align 
the prototype with the inage are reused in the identification stage to align the individual rodels with 
the inage. As a result, identification is reduced to a series of sinple tenplate conparisons. The paper 
concludes with an algori thmfor constructing opt inal prototypes for classes of objects. 
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1 Introduction 

Our worl d contai ns an overwhel ring vari ety of obj ects. 
While people deronstrate outstanding abilities to nam 
orize and recognize thousands of objects [27, 37, 38], 
conputer vision applications largely fail to acconra- 
date these nunbers. i'pparently, the rain tool that en- 
ables people to effectively handle this nassive arount 
of obj ects i s categori zati on. By di vi di ng the obj ects i nto 
classes, the visual systemis capable of concluding prop- 
erties of unfariliar objects fromtheir resenblance to 
f aril iar ones. lor f aril iar obj ects, categorization offers 
an indexing tool into the stored library of object repre- 
sentations. 

Recognition can be performd in different "levels of 
abstraction". lor exanple, the sam object can be rec- 
ognized as a face, ahumnface, or as a specific person's 
face. Psychological studies suggest the existence of a pre- 
ferred level for recognition, called "the basic level of ab- 
straction" [33] . Existing conputational schems usually 
approach recogni ti on i n ei ther one of two 1 evel s . Several 
schems attenpt to cl assify obj ects in their basic level 
of abstraction (we refer to this task by cat eg on z at i on), 
while other schems attenpt to deterrine the specific 
identity of objects (we refer to this task by identifica- 
tion). This paper presents a novel approach for recogni- 
tion that conbines the two tasks. 

T>see howthe two tasks are related, consider the fol- 
1 owi ng exanpl e. Suppose you are wal ki ng down a street , 
and somone is coring towards you. You look at the 
person's face, and it looks fariliar, but you cannot tell 
whoit is. So youtry to picture the people youknowwho 
1 ook like the person yousee, until finally, yourealize who 
the person is. 

Anunber of hypotheses can be drawnfromtbis story. 
First, recognition can be broken into two stages: cat- 
egorization and identification, where categorization is 
believed to precede identification. Second, during the 
course of recognition the inage is conpared against a 
nunber of object rodels. insuring that indeed catego- 
ri zati on precedes identification, only rodels that belong 
tothe obj ect' s class needto be considered. Finally, when 
a newrodel is conpared to the inage, the conparison 
process naybenefit fronihe use of infornati on acquired 
duri ng categori zati on. Note that the si tuati on descri bed 
here is not specific to faces. Qie caninagine that siri- 
1 ar situations occur when other objects, such as ani rials, 
cars, and chairs, are observed. 

T> see how infornati on acquired during categoriza- 
tion can be usedfor identification, consider the exanple 
of face recogni ti on. Wen a face i s recogni zed, the i rage 
positions of its parts and features are known. In partic- 
ular, an observer already knows where the eyes, nose, 
and routh are and can even infer the direction of gaze 
and expression. The person's identity is not essential for 
extracting and locating these features. Instead, they are 
rat ched against features in a "generic" representation. 
In addition, other features, such as a beard, hair style, 
and wrinkles, that nay better distinguish between dif- 
ferent persons nay be located. M>re generally, we can 
postulate that, during categorization, sub-structures of 



the objects (such as parts and features) are extracted 
and located with respect to a generic rodel , and the 
object's pose is deterrined. 

To followthis exanple, I propose a schem f or recog- 
nizing 3D objects fromsingle 2D views that conbines 
the two stages, categorization and identification. Cat- 
egorization is achieved by aligning the inage to proto- 
type objects. The prototype that appears rost sirilar 
to the inage deterrines the class identity of the object. 
Ater the object is categorized, its specific identity is de- 
terrined by aligning the observed object to individual 
rodels of its class. By first categorizing the object, not 
only the nunber of rodels considered for identification 
is reduced, but also the cost of conparing each rodel 
tothe inage significantly decreases. This is achieved by 
reusing the correspondence and pose conputed for the 
prototype in the categorization stage to align the inage 
with the individual rodels. Wshowinthis paper that, 
albeit a perfect natch between the prototype and the 
inage is not obtainable, the correspondence and pose 
can be conputed for the prototype, and can be used 
to bring the inage and the object's rodel into align- 
mnt. Consequently recovering the correspondence and 
pose for the individual rodels becoms unnecessary, and 
identification is reduced to a series of sinple tenplate 
conparisons. 

The rest of this paper is divided as follows. Section 2 
reviews the rain existing approaches for categorization 
and identification. Section 3 presents the schem of 
recogni ti on by prototypes. Section 4 proposes analgo- 
rithmfor generating optinal prototypes for the schem. 
Section 5 discusses the relevance of the schem to hu- 
nan recognition. Inplemntati on results are presented 
in Section 6. 

2 Previous ^jproaches 

Existing schems for categorization often use a "reduc- 
tionist" approach. The inage, whi ch contains a detailed 
appearance of an object, is transformd into a conpact 
representation that is invariant for all objects of the 
sam cl ass . Qie conron approach to generati ng such a 
representation is by deconposing the object into parts. 
Parts are extracted by cutting the object in concavities 
[17, 22, 43] and labeled according to their general shape. 
The labels, together with the spatial relationships be- 
tween the parts, are used to identify the class of the ob- 
ject [4, 6, 7, 26] . Asecond approach extracts the parts of 
the object that fulfill certain functions. The list of func- 
tions is used to deterrine the object's class [16, 39, 47]. 

Schems that break obj ects into parts are insufficient 
to explain all the aspects of recognition for the following 
reasons. First, in nany cases objects that belong to the 
sam class differ only by their detailed shape, while they 
share roughly the sam set of parts. Mreover, even ob- 
jects that at som level ray be considered belonging to 
different classes, such as a cat and a dog, nay also share 
roughly the sam set of parts. T) solve this problemsev- 
eral system also store, in addition to the part structure 
of the objects, the detailed shape of the parts [2, 6, 7]. 
Another problemis that nany of the techniques for rec- 
ognizing objects by part deconposition rely on finding 
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the entire parts fromthe inage. 

T> recognize the specific identity of objects, a rel- 
atively detailed representation of the object's shape is 
conpared with the inage. Ai exanple for such nath- 
ods is alignnant [3, 9, 12, 13, 18, 25, 40, 41]. Aignnant 
involves recovering the position and orientation (pose) in 
whi ch the obj ect i s observed and conpari ng the appear- 
ance of the object fromthat pose with the inage. Chly 
a fewattenpts have been nade in the past to extend 
the alignnant schena to the problemof object catego- 
rization (e. g. , [36]). The nain di Utility in applying the 
alignnant approach is the recovery of the pose of the 
observed obj ect . In rost inplenantations this involves 
a tina-consumng stage for finding the correspondence 
between the rodel and the inage. The process beconas 
inpractical when the inage is conpared against a large 
library of objects, because typically the correspondence 
is established bet ween the inage and each of the rodel s 
in the library separately. 

T> handle 1 arge libraries, indexing nathods were pro- 
posed (e.g. , [20, 46, 14]). The basic i dea is the following. 
Acert ai n f uncti on i s defined and appl i ed to the vi ews of 
all the objects in the library. The object rodel s are ar- 
rangedin a look-up table indexed by the obt ai ned func- 
tion values. Wen an inage is given, the function is 
appl i ed to the i nage, and the obt ai ned val ue i s used to 
index into the table. T> reduce the size of the table and 
the conplexity of its preparation, invariant functions, 
functions that when applied to different views of an ob- 
ject return the sana value regardless of viewpoint, often 
are used as the indexing functions. 

Indexing nathods suffer from several short cormngs. 
First, existing indexing nathods handle only rigid ob- 
jects. Extending these nathods to handle classes of ob- 
jects has not been discussed. Second, because of com 
plexity issues, indexing functions usually are applied to 
snail nunbers of features, A a result, high rates of 
false positives are obtained, and the effectiveness of the 
i ndexi ng i s reduced. 

The schena presented in this paper is designed to 
work where traditional approaches to categorization and 
i ndexi ng f ai 1 . The s chena confoi nes bot h cat egor i z at i on 
and i denti ficati on of obj ects , and uses f ai rl y detai 1 ed rep- 
resentations for objects. Efether than indexing directly 
to the specific object rodel, the schena indexes into 
the library of objects by categorizing the object. The 
classes handled by the schena include objects with rel- 
atively si nil ar shapes. T) fit into the schena, insona 
cases basic level classes are brokeninto sub-cl asses. The 
general problemof categorization therefore nay require 
additional tools. 

3 Recognition by H*ototypes 

The recognition by prototypes schena proceeds as fol- 
lows. Alibraryof 3D object rodel s is stored in nam 
ory. The rodels in the library are divi ded into cl asses, 
and 3D prototype objects are selected to represent the 
classes. For every class, the rodels in the class are 
aligned in the library with the prototype object. The 
role of this 3D alignnant will becona clear shortly. 



At recognition tina, an inconimg 2D inage is first 
natched against all of the prototypes. For each proto- 
type object, the systemattenpts to recover the viewof 
the prototype that rost resenbles the inage. T> do so, 
the systemrecovers the correspondence betweenthe pro- 
totype and the inage, and, using this correspondence, it 
deternimes the transfornation that best aligns the pro- 
totype with the inage. This transfornation, referred 
to as the prot otype transform, is then applied to the 
prototype, and the si nil arity between the transfornad 
prototype and the actual inage is evaluated. Since the 
observed obj ect in general differs fromthe prototype ob- 
ject, aperfect natch between the two is not anticipated. 
The systemtherefore seeks a prototype that reasonably 
natches the inage. Qice such a prototype is found, the 
class identity of the object is deternined. 

Ater the object's class is deternined, the systemat- 
tenpts to recover the specific identity of the object. A 
this stage, the inage is natched against all the rodels 
of the object's class. R>r each of these rodels, the sys- 
temseeks to recover the transfornation that aligns the 
rodel with the inage. A will be shown below; since the 
rodels are aligned in the library with the prototype, the 
transfornation that best aligns the prototype with the 
inage is identical to the transfornation that aligns the 
rodel to the inage. The prototype transformtherefore 
is applied to the specific rodels, and their appearance 
fromthis pose is conpared with the inage. The rodel 
that aligns with the inage, if there is such, deternines 
the specific identity of the object. 

The rest of this section is divided as follows. In Sec- 
tions. 1 the object representation used in our schena is 
presented. Section 3. 2 describes the categorizationstage, 
and Section 3. 3 describes the identification stage. 

3.1 Object representation the linear 
combi nat i on s chena 

In our schena, an object is rodel ed by a natrix M 
of size n x k, where n is the nunber of feature points, 
and k represents the degrees of freedomof the object. 
A vector a £ 1Z k , referred to as the transformvect or, 
represents the transfornation applied to the object in a 
certai n vi ew, and the obj ect ' s appearance f romthi s vi ew 
i s gi ven by 

1 = M~a ( 1) 

In the rest of this section we explain the use of this nota- 
tion. The notation follows fromthe linear conbi nation 
schena [42], which is bri efly revi ewed bel ow 

Under the linear conbi nation schena an object is 
rodel ed by a snail set of views, each is represented 
by a vector containing point positions, where the points 
in these views are ordered in correspondence. Nrvel 
vi ews of the obj ect are obtai ned by appl yi ng 1 i near com 
binations to the stored views. Additional constraints 
nay apply to the coefitients of this linear conbi nat ion. 
(imputing the object pose therefore requires recovering 
t he coeffti ent s of t he 1 i near conbi nat i on t hat al i gn t he 
rodel with the inage and verifying that the recovered 
coefitients indeed satisfy the constraints. The nathod 
handles rigid obj ects under weak- perspective projection 
(nanaly, orthographic proj ect ion followed by a uniform 



scaling). It was extendedto approxinate the appearance 
of obj ects w th srooth boundi ng surfaces and to handl e 
articul ated obj ects. In our representation, the columns 
of the rodel natrix M cont ai n vi ews of the object, and 
t he coeffii ent s of t he 1 i near confoi nat i on t hat al i gn t he 
rodel with the inage are given by the transformvector 
~a . 

Rr concreteness, we reviewthe linear conbination 
schem for rigid objects, (insider a 3D object O that 
contains n feature points (J$,Yi, $) , 1 < i < n . Ihder 
weak- perspective projection, the position of the object 
following a rotation R, translation^, and scaling s is 
gi ven by 

Xi = s ri iX* + s j£ 2 Yi +s ri 3 Zi +t x 
yi = s r 21 X { +s r 22 Y { +s r 23 Zi +t y 

where r ^ are the conponents of the rotation natrix, R, 
andt x , iy are the horizontal and vertical conponents of 
the transl ati on vector, t respectively. 

Denote by X ,Y ,Z , ~x , ~y n (yeKors of Xi, Y Z, x 
and y , values respectively, and denote 1 =(1, .. . , 1) 
1Z" , we can rewrite Eq. 2 in a vector equation as follows: 



Views of the object can be constructed as follows 
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Therefore 

~x , ~y £ s$3,Y{Z ,1} (4) 

Dfferent views of the object are obtained by chang- 
ing the rotation, scale, and translation pararaters, and 
these changes result in changing the coeffii ents inlq. 3. 
Wnay therefore conclude that all the views of a rigid 
object are contained in a AD linear space. 

This property, that the views of a rigid object are 
contained in a AD linear space, provides a mthod for 
constructing viewer-centered representations for the ob- 
j ect . The i dea i s to use i rages of the obj ect to construct 
a basis for this space. In general, two views provide suf- 
ficiently nany vectors. Therefore, any novel viewis a 
1 i near confoi nat i on of t wo vi e ws [30, 42] . 

N)t every 1 i near confoi nati on i s a val i d vi ewof a ri gi d 
obj ect . R)l 1 owi ng the orthonornal i ty of the rowvectors 
of the rotation natrix, the coeffii ents in Eq. 3 must 
satisfy the two quadratic constraints 



a'{ +a I +a I =b \ +b \ +b I 



(5) 



aih +a 2&2 +a 3&3 =0 

Wen the constraints are not satisfied, distorted (by 
stretch or shear) pictures of the objects are generated. 
Incase a viewer-centeredrepresentationis used, the con- 
straints change in accordance with the selected basis. A 
thirdviewof the object can be used to recover the new 
constraints. 

P6r the purpose of t hi s paper a rodel for a ri gi dobj ect 
can be constructed by bui 1 di ng the f ol 1 owi ng n x 4 rodel 
natrix 

M=( X,Y ,Z,1) 



x 

"y 



M~a 

M 



(6) 



where ~a =(ai, §, §, f) and b =(bi, J), J), £) are the 
coeffiients fromlq. 3. Nrtice that the two linear sys- 
tem can be narged into one by constructing a rodified 
rodel natrix in the following way 



~x\ _ M W a 
■y I ~ { M { b 



(7) 



Similar constructions can be obtained for objects with 
srooth bounding surfaces and for articulated objects. 
The width of M, k , should then be rodified according 
to the degrees of freedomof the rodel ed obj ect. As was 
mntioned above, viewer-centeredrepresentations canbe 
obtained by constructing a basis for the AD space from 
i rages of the obj ect. Therefore, vi ewer- centered rodel s 
can be obtained by replacing the colunm vectors of M 
£wi ththe constructed basis. 

To sumnarize, following the linear conbination 
schem we can represent an obj ect by a natrix M and 
construct views of the object by applying it to trans- 
formvectors ~h . Eor rigid objects not every transform 
vector is valid; the conponents of the transformvector 
must satisfy the two quadratic constraints, recognition 
involves recovering the transformvector ~a and verifying 
that its conponents satisfy the two constraints. Ignor- 
ing these constraints will result in recognizing the obj ect 
evenwhenit undergoes general 3D affile trans for nat ion. 
In the analysis below we largely ignore the quadratic 
constraints. These constraints, however, can be verified 
both during the categorization stage as well as during 
the identification stage. 

3.2 Cat egor i zat i on 

The recognition by prototypes schem begins by deter- 
mining the object's category. This is achi eved by com 
paring the observed obj ect to prototype objects, objects 
that are "typical exenplars" for their classes. P6r a given 
prototype, the viewof the prototype that rost resem 
bles the inage is recovered and conpared to the actual 
inage, and the result of this conpari son determines the 
class identity of the object. 

W begin our description of the categorization stage 
by defining the data structures used by the schem. A 
class C =(P, {MM, ••(},) M a pair that includes a 
prototype P and a set of object rodel s M \, M, ■ ■ I- , M 
Both the prototype and the rodel s are represented by 
n x k natrices, where n defines the nunber of feature 
points considered, and A; denotes the degrees of freedom 
of the objects. For the sake of sinplicitywe assura here 
that all the objects in the class share the sam nunber 
of feature points, n , and that they have similar degrees 
of freedoni k . N)te that similar objects tend to have 
similar degrees of freedom(e. g. , all of themare rigid). 
Both ass unpt ions are not strict, however. The schem 
canbe rodified to tolerate both varying nunber of fea- 
ture points as well as different degrees of freedom The 
details will be discussed 1 at er in this paper. N)te that 



the obj ects can be radel ed by ei ther obj ect- centered or 
viewer-centeredrepresentations. Incase vi ever- centered 
representations are used we shall assum that the rodels 
represent the objects fromthe sam range of viewpoints. 

Aclass in our schem contains objects with similar 
shapes. These objects share roughly the sam topolo- 
gies, and there exists a "natural" correspondence be- 
tween them Consider, for instance, the two chairs in 
Figure 1. A though the shapes of these chairs are dif- 
ferent, and som parts (e.g., the arm) appear only in 
one chair and not inthe other, anatural correspondence 
between features inthe two objects can be deternimed. 

Inthe library of rodels, the natural correspondence 
between obj ects is nade explicit. It is specified by the 
order of the rowvectors of the rodels. Specifically, given 
a prototype P and obj ect rodels M i, . . ;,,weM>rder 
the rows of these rodels such that the first feature point 
of P corresponds to the first feature point of each of the 
rodels Mi, . . ;, ,anifso forth. 

Given the library of obj ects and gi ven an i ncoimng i m 
age, the recognition by prototypes schem begins by cat- 
egorizing the object observed in the inage. To achieve 
this goal, the prototype objects are aligned and com 
pared to the inage. Eor every prototype, the correspon- 
dence between the inage and the prototype is first re- 
solved, and, using this correspondence, the nearest pro- 
totype viewis recovered. E^ doing so, the schem de- 
couples the two factors that affect the appearance of the 
object inthe inage, namly, viewvari ations and shape 
variations. % selecting the nearest prototype viewto 
the inage, the schem conpensates for view variations. 
Then, by evaluating the similarity between the nearest 
prototype viewandthe actual inage, it accounts for the 
differences in shape between the prototype and the ob- 
served obj ect. 

The first stage i matching the prototype to the inage 
involves the recovery of correspondence between proto- 
type and inage features. In existing system for rec- 
ognizing the specific identity of objects establishing the 
correspondence between inages and object rodels in- 
volves a tim- cons urmng process in which sophisti cated 
algorithm are applied [10, 13, 15, 18, 23, 25, 35, 41]. 
These algorithm rely on the property that, when the 
correct correspondence between a rodel and an inage is 
established, a near- perfect natch between the two is ob- 
tained. \ffile this assunptionis validfor identification, 
it cannot be used under our schem since the prototype 
and the inage generally represent different objects. 

To deternime the correspondence between the proto- 
type and the i nage, we define an obj ecti ve f uncti on that 
is applied to the prototype and the inage under a given 
correspondence and that obtains its rmni immunder the 
correct correspondence. The obj ecti ve function will ma- 
sure the quality of the natch bet ween the prototype and 
the inage. Nimly, under this masure the correct cor- 
respondence is the one that brings the prototype into 
its best alignrant with the inage. G ven this objective 
function, correspondence is aconbinatorial optimization 
probl ern and so rmni rmzati on techni ques can be used to 
resolve the correspondence between the prototype and 
the inage. This paper does not propose a specific tech- 



nique for solving the correspondence problem 

Asunimg the correspondence probl emcan be solved, 
the schem proceeds as follows. Gven a prototype P 
and an inage I, we generate a view vector 1> fromthe 
inage by extracting the location of feature points and 
arranging themin a vector. The points in 1> are ordered 
in correspondence to the prototype points; that is, the 
first point in 1> corresponds to the first point in P and 
so forth. The prot otype t ransf ormis the transfornation 
that brings the prototype points as close as possible to 
their corresponding inage points. The prototype trans- 
forrn therefore, is the transformvector b that minimizes 
the Euclidean distance betweenthe prototype andinage 
poi nt s , naml y 

rmn \\Pb' - 1 || (8) 

v 

A solution for (8) is obtained as follows. Asumng P 

is over deternimed; that is, P is n x k where n > k and 

r a nk (P) = k , and denote by iP = (P T P )~ 1 P T the 

pseudo- inverse of P , the prototype transforrn b , is given 

by 

b =P + "v (9) 

and the nearest prot otype view ~p is obtained by applying 
P to the prototype transforrn b , that is 



P 



(10) 



The nearest prototype viewis nowconpared to the 
inage, and their resenblance determines the class iden- 
tityof the object. The qualityof the natch between the 
prototype and the inage is defined by 

D(P, T;)=|| P -I'll =1IW^II (ii) 

To elinimate effects due to scaling of the object, this 
masure should be nornalized, as is illustrated by the 
exanple below Gnsider an obj ect seenfromsom view 
~\. Its distance to the prototype is given by D(P, iju 
Suppose the object is nowseen froma newview1> 2 that 
is identical to fy except that the object is nowas twice 
as close to the camra. Ihder these conditions 1> 2 =21>i, 
and its distance to the prototype is given by D(P , $) = 
2D(P, iju. G early, we should have a masure that is 
independent of the distance of the obj ect to the camra. 
Qie way to obtain such a masure is by di vi ding D(P , 1> ) 
by the norm|| 1> || 

(P?-I)~v\\ 



D(P, 10 1 



II » II 



(12) 



D(P , 1> ) is proposed here as an objective function for 
establishing the correspondence betweenthe prototype 
and the inage. Inother words, we expect that if the ob- 
j ect belongs to the prototype's cl ass t hen D(P , ~v ) obtains 
i ts rmni nal val ue when 1> i s ordered i n correspondence to 
P . Any other permutation will increase the value of D. 
Fornally, denote by a a per nutation natrix, we assum 
that 

D(P, 10=rmH(_P, a ~v ) (13) 

a 

The masure D(P, ~v ) has a second role. Since it ma- 
sures the sirmlarity between the prototype and the im 
age, it can also be used to determine the obj ect's class. 





figure 1: "Mtural" correspondences betwentvo chair 



Ai obj ect observed i n a vi ew ~v bel ongs to the cl ass rep- 
resented by a prototype P if 

D(P, 1)<€ (14) 

for sons constant e > 0. Wrefer to (14) as the catego- 
ri zat i on cri t eri on. 

The categorization stage proceeds as follow. Gven 
an inage I and a prototype P , the correspondence be- 
tween P and I is resolved by rmnirmzing the raasure , , • a ,■ ■ , m , r 

J . ^> _, . nat en lntlect ion points. Ciner exanples lor masures 

D(P , a v) overall possible per nutation a of v , and if tjhg^ CO nfoine proxinity and si nil arity include masures 
obtainedimninmm D(P , ~v ) is belowthe thresholds , thenthat retainthe tangent or the curvature of points. Mre 
the class identity of the object is deterimned. sophisticated masures nay conpare the topologies of 

N)te that in our schem the prototype and the cate- the objects in the tw) view, or, in other -words, verify 

gori zat ion criterion deter nine the actual division of ob- that the objects share sinilar part structures in2_D. 
jects to classes; an object belongs to a certain class if 



exanple for a naasure that considers both the proxim 
i ty and si nil arity between feature points is the following 
naasure. Each feature point is associated with a la- 
bel (such as a corner or an inflection point) . A^ain, the 
naasure D(P , ~v ) is applied, but this tina only correspon- 
dences between points with sinilar labels are allowed; 
nanaly, corners in the inage can only natch corners in 
the prototype, and, sirmlarly, inflection points can only 



its views are suffiiently sinilar, according to the cate- 
gorization criterion, to views of the prototype. Ihder 
the above definition, an obj ect belongs to a prototype's 
class if the total difference betweenits feature points and 
their corresponding prototype points does not exceed e . 

The naasure D(P , ~v ) denned here deternines the sim 
il arity between the prototype P and the view ~v using 
only the distances between feature points. In general, 
since correspondence is diffiult to achieve, such a naa- 
sure would not be robust. Including additional inforna- 
tion about the features in the si nil arity naasure nay 
increase the robustness of the schena. A so, masures 
that consider only the proxinity of feature points are 
United in terra of dividing the library into classes, since 
they induce cl asses of obj ects with highly si nil ar shapes. 
Masures that consider additional infornation can ex- 
tendthe classes to include larger sets of objects. 

The naasure D(P, ~v ) can be enriched by considering 
the sinil arity between corresponding points. Asinple 



A useful technique in masuring the si nil arity be- 
tween the inage and the nearest prototype view is to 
consider a different set of features than the set used to 
deternine the prototype transform The rational behind 
this technique is that it is generally diffiult to recover 
exact feature-to- feature correspondence, and while such 
correspondences are necessary for recovering the proto- 
type transforrn si nil arity masures can be successfully 
applied even in the absence of exact feature-to- feature 
correspondence. This idea resenbles the basic principle 
of the alignmnt algorithm[ 18, 41], in whi ch a snail set 
of points is used to conpute the object pose, while a 
larger set of points is used to verify this pose. 

It should be noted that the general flowof the schena 
and, in particul ar, the i dent ificat ion stage are indepen- 
dent of the specific choice of sinil arity naasure. A has 
been noted above, the naasure affects the division of 
rodel libraries into classes and the selection of optinal 
prototypes for these classes. An exanple for selecting 
the optinal prototype for a given class under the ma- 



sure specified in (12) (for either 1 abeledor unlabeled fea- 
tures) is described in Section 4. 

Fi nal 1 y, al though the mi n obj ecti ve of the categori za- 
t ion stage is to deter rime the cl ass identity of the obj ect, 
the categori zati on schem described above is useful even 
if the object's category cannot be deterrimed. Section 
3.3 belowshows that the prototype transformcan be 
reused to align the inage with the specific rodels. Con- 
sequently following the categorization stage the cost of 
conparing the inage to each of the specific rodels is 
substanti ally reduced since the diffiult part of recover- 
ing the transfornation that relates the rodels to the 
inage is applied only to the prototype objects. A a 
result, if the class identity of the object cannot be deter- 
rimed ve still need to consi der all the specific rodels in 
the library but the overall cost of conparing the rod- 
els to the inage would be lowbecause correspondence is 
conputedonce for the whole class. 

3. 3 Ident i ficat i on 

itfter the observed object is categorized, the system 
turns to recovering its individual identity. A this stage 
the inage is natched to all the rodels in the object's 
class. lor each rodel , the systemseeks to recover the 
transfornation that aligns the rodel to the inage, if 
there is such. In previous schems this required recover- 
ing the correspondence between the inage and each of 
the rodels separately. In our schem, however, this no 
longer is necessary, since the object transformis deter- 
rimed directly fromthe prototype transform Wshow 
in this section that the prototype and the object trans- 
form are related by a sinple transfornation, which can 
be conputed in advance, and which can in fact be un- 
done already in the library of stored rodels. Conse- 
quently the prototype transformcan be reused in the 
i dent i ficat ion stage to align the individual rodels with 
the inage. 

The initial stage of categorization recovers three 
pieces of inf or rati on that can be used for identification. 
The three are (i) the object class, (ii) the correspon- 
dence between the prototype and the inage, and (iii) 
the prototype transform This infornation is used in 
the identification stage as follows. First, since the ob- 
ject's class is deterrimed, only rodels that belong to 
this class are considered. Second, using the correspon- 
dence between the prototype and the inage established 
in the categorization stage, and using the stored corre- 
spondence betweenthe prototype andthe object rodels, 
the correspondence betweenthe rodels andthe inage 
is inmadiately recovered. Finally, as is shown below; 
the rodel transforrn namly, the transfornation that 
aligns the rodel with the inage, is recovered fromthe 
prototype transform 

Asuib we are given with a view ~v of sons object 
rodel M ; , naml y 

-y =M~a (15) 

for sons transformvector ~h . Wen the identification 
process begins, it is still unknown whi ch of the rodels 
Mi, . . i of Mae object's class accounts for the inage 
and what the transformvector ~a is. The first task faced 



by the schem at this stage is to recover the rodel trans- 
forrn ~a . This is done, as is explained below; using the 
prototype transform b =P + ~v defined in (9). Qice "a is 
recovered, it is applied to all the rodels Af, . . ;, ,anif 
the rodel for which a near- perfect natch is obtained 
deterrimes the object's identity. 

Theoreml belowestablishes that the rodel transform 
"a can be recovered directly fromthe prototype transform 
b by applying a linear transfornation whi chis referred to 
as the prot ot ype- 1 o- rwdel t ransf orm This transformhas 
two interesting properties. First, i t i s vi ew i ndependent ; 
namly, for any gi ven vi ewof the object, the sara trans- 
formnaps the prototype transformthat corresponds to 
this viewtothe correct rodel transform The prototype- 
to- rodel transformtherefore can be conputed in ad- 
vance and stored in the library of rodels. Second, the 
prototype- to- rodel transformcan be used to recover the 
rodel transformregardless of the quality of natch be- 
tween the prototype and the inage. In other words, 
even if the prototype aligns poorly with the inage, the 
transfornation that aligns the rodel with the inage is 
deterrimed correctly in this process. 

Theoreml: Given a view ~v = Mi~a . Leb = P + ~v 
he the prototype transform, that is, the transformvec- 
tor that best aligns the prototype with the irmge. The 
rwdel transform ~a , can be recovered fromthe prototype 
tr ansfornj) , by appl yi ng a rmtri %,Anarml y 

~a =Ab 
Ai is referred to as the prot ot ype- 1 o- rwdel t ransf orm 
Proof: Notice that 

b =P+-v =P r Mra 
Asuib P + Mi is invertible, let 

Ai =(P +M i )- 1 
we obt ai n t hat 



"a =Ab 



a 



Corollary 2: The prototype- to- rwdel transformis 
vi ew- i ndependent . 

Proof: The prototype- to- rodel transforrn A 8 , is in- 
dependent of both pose vectors, ~a and b . Changing the 
inage ~v will result inanewpair of pose vectors, ~h and 
b , but si rill ar to the old pair, the new pair is related 
through the s am t ransf or mA {. The prototype- to- rodel 
transform^ ; therefore can be used to recover the obj ect 
pose for any vi ewof M ; . □ 

Ai exists if PM{ is invertible. This condition is 
equivalent to requiring that the two colunm spaces of 
P and M,' will not be orthogonal in any direction. The 
condition holds, in general, when the two objects are 
fairly sirmlar. This is illustrated by the following ex- 
anple. Consider the case that both colunm spaces of 
P and Af,' are one- di nans i onal ; namly, each represents 
a line through the origin. The only case in this one- 
dimnsional exanple in which A , does not exist is when 
P and Mi are orthogonal. Bit these lines are farthest 



apart when they are orthogonal. Consequently, if the 
objects are relatively si iml ar ^vould exist. 

Since it depends only on the prototype P and the 
rodel Mi, the prototype- to- rode 1 transform/! 8 - can be 
pre- conputed and stored i n the 1 i brary of rodel s . Every 
rodel M ; £ C is associ ated with its owi trans form Ai 
that relates, for every possible viewof M, betweenthe 
prototype transformand the rodel transform T> com 
pare the inage to the rodel M i the rodel transform 
should first be recovered. This is achi eved by appl yi ng 
Ai to the prototype transformconputed in the catego- 
rization stage. 

A so, the prototype- to- rodel transform A ,, can be 
used to align the rodel M ; with the prototype P in 3D. 
Ebnote the aligned rodel by M [, M[ rodels the sam 
object as M does, since their col unm vectors span the 
sam space. In addition, the aligned rodel M / has the 

property that it is brought by the prototype transform b , 
to a perfect alignmnt with the inage. Consequently, if 
the rodels are aligned in the library with the prototype, 
the prototype transformconputed in the categorization 
stage can be reused for identification with no further 
rani pul at i ons . This is established in Theorem3 below 

Theorem3: Let M / =M ,A; he the model M aligned 
m th the prototype P . For any vi ew~v i=iM the proto- 
type transformfor this vibew=P + ~v is i denti cal to t 
model transformfor this view that is, ~ib ^=M 



for instance, the case of si nil ar chairs, sons of which are 
folding. Obviously, the folding chairs have rore degrees 
of freedomthan the regul ar, rigid chairs, and therefore 
they would be represented in the library by wider ra- 
trices than the rigid chairs are. M is expl ai ned bel ow, 
the chairs can be handled in a conron class, and the 
prototype for the class would itself be a folding chair. 

M>re generally, let M 1; . . ; ipeMclass of rodels of 
different widths, and denote by A; i, . ./.the&vidth of 
Mi, . . i i;es^ictively. Let P be the prototype for this 
cl ass , and denote by k p the wi dth of P , we set k p to be 



t{ k 



1} 



(16) 



R*oof : 



Since 



M! =M ,-Ai 



we obt ai n t hat 



M-b =MiAib =Mili 



Lsing Theorem3, the i denti ficati on schem is sim 
pi i fled as follows. The rodels Mi, . . j areMlignedin 
the library with the prototype P by applying the cor- 
responding prototype- to- rodel transforrn A 1, . . ;. J*tA 
recogni tion tins, the prototype transform b =P + ~v , is 
applied to the aligned rodels M [, . . /., AWordingto 
Theorem 1 and 3, by transfornimg the rodels by b the 

correct rodel , M[, woul d perfectly align with the inage. 

In the schem above we assured that full feature-to- 
feature correspondence is established between the proto- 
type and the inage. This assunptionis not randatory 
Mthods for esti rating the prototype transformusing 
partial correspondence or by considering other types of 
features (such as line segmnts) can also be used. N)te 
that incase the prototype transformcanonly be approx- 
irated, the accuracy of the rodel transformobtained is 
deternimed by the qual i ty of t hi s approxi rati on and by 
the condition nunber of the prototype- to- rodel trans- 
form/1 8 -. The condition nunber of A ; affects the natch 
even if 1heorem3 is applied, namly, even if the rod- 
els are aligned with the prototype in advance. Conse- 
quently, the condi tion nunber of the prototype- to- rodel 
transform/1 ; shoul d be taken i nto account when the 1 i- 
braryis divided into classes. 

Finally, the schem can be extended to handle classes 
of objects with different degrees of freedom Consider, 



In other words, we require the prototype to have the 
sam degrees of freedomas the rost flexible object in 
the class. Wean set k p according to our goal since, as it 
is shown in Section 4, the prototype P i s obt ai ned i n our 
schem by rani pul ating the objects in the class. The 
prototype- to- rodel transform^ ; is denned in this case 

by 

A,-=(P+M,-) + (17) 

where A i is kp x k{. It is strai ghtforward to extend The- 
oreml to also include this case. Consequently, for any 
viewof Mi, the rodel transform - ^ can be recovered from 

its corresponding prototype transform b by appl yi ng t he 

he ~* 

prototype- to- rodel transform/! ; to b . N)te that since 

k p > ki the prototype can appear in poses that do not 
natch any possible rodel pose (and therefore in noise- 
less conditions they are inpossible to obtain). In case 
the object is observed fromsuch a view; A ; would rap 
this unratched prototype transformto the rodel trans- 
formthat corresponds to the nearest rat ched prototype 
transform Ey setting k p to be as 1 arge as the raxinaim 
of ki, . . 1 .we, afoid cases where there exist views of the 
object that cannot be accounted for by the prototype. 
Mdel transform that correspond to such views cannot 
be recovered fromprototype transform. 

3. 4 Surmary 

Wpresentedinthis section a schem for recognizing 3D 
objects frorrsingle 2D views that proceeds intwostages, 
categorization and i denti fie at ion. In the categorization 
stage the inage is conpared against the stored proto- 
types. For every prototype, the correspondence between 
the inage and the prototype is recovered, and the near- 
est viewof the prototype is constructed. The sirmlarity 
between this viewand the inage is evaluated, and, if the 
two are found si nil ar, the class identity of the obj ect is 
deternined. In the i dentificati on stage the observed ob- 
j ect is conpared against the rodels of its class. Since 
the prototype andthe rodels were brought inthe library 
i nt o al i gnmnt , the sam transforration that aligns the 
prototype to the inage also aligns the object rodel to 
the inage. The prototype transformtherefore is applied 
t o t he rodel s , and t he obt ai ned vi ews are conpared wi t h 
the inage. The viewthat is found to be identical up to 
noi se and occl usi on to the i rage deternines the i ndi vi d- 
ual identity of the object. 

The presented schem is based on several key princi- 
pals. Pfecogni tion is divided into two sub- processes, cat- 
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egorizati on and identification. In both processes rod- 
els are aligned with the inage, and the identity of the 
object is deterrmned by a 2D comparison; 3D recon- 
struction of the observed obj ect fromthe inage is not 
perfornad. The diffiult conponent of the alignrant 
approach, nanaly the recovery of correspondence and 
object pose, is perfornad only once for each class; the 
prototype transformis reusedinthe i dent ificat ion stage 
to al i gn the i rage wi th the i ndi vi dual rodel s . 

4 Constructirg optimal prototypes 

In the schena above we as s una d that the classes in the 
library of rodel s are represented by prototype objects. 
Since categorization is achieved by natching the im 
age to prototype objects, the question of howto select 
the best prototype should be addressed. In this section 
represent an algorithmfor constructing opt inal proto- 
types. 

Gven a class of objects, the optinal prototype for 
this class is the object that resenbles the objects of the 
class the rost. Ihder our fornml ation, such an obj ect 
w)uld share as nany features as possible with the ob- 
jects of its class, the position of these features on the 
prototype w)uldbe as close as possible to their position 
on the objects, and the prototype- to- rodel transform 
for these objects would be as stable as possible. Efelow 
we showthat the optinal prototype can effectively be 
conputed using principal conponent analysis; that is, 
by conputing the dominant eigenvectors for sona na- 
trix deterrmned by the rodels of the class. 

Principal conponent analysis often is used in clas- 
sification problem to construct classes and prototypes 
[11]. In existing applications, an obj ect is represented by 
a poi nt i n sona hi gh di nansi onal space, where each com 
ponent of this point contains aninvariant attribute of the 
object. Ahyperplane in that space represents a class of 
objects. The goal of the principal conponent analysis 
is, given a set of points (objects), to recover the class 
that these points induce. Qir case is sonawhat differ- 
ent. In our case an obj ect is represented by a continuous 
linear space rather than by a point. Wereas the use 
of hyperplanes in other schenas often is arbitrary and 
nade prinarilyfor convenience, their use in our schena 
is appropriate following the linear confoi nation schena 
[42] (see Section 3.1). 

The differences outlined above also inply differences 
i n t he pr oof t hat pr i nci pi e conponent anal ys i s appl i es 
to our case. Wshowbelowthat the optinal prototype 
can be conputed by pri nci pal conponent anal ysi s . The 
traditional proof needs to be extended since in our case 
objects are represented by continuous spaces rather than 
by discrete points. 

The prototype constructed in this process is a 3D ob- 
ject obtained by nanipulating the objects in its class. 
T> allowthe construction, it seem as if the objects in 
the class should first be brought into alignnant. In par- 
ticular, if the objects are represented by viewer- centered 
rodels (that is, by sets of their views, see Section 3. 1 for 
details), the different objects would then have to be rep- 
resented by inages taken fromsi nil ar viewpoints. Nsv- 
ertheless, the process presented below does not require 



aninitial alignnant of the objects. The sana prototype 
is obtained in this process evenwhenthe objects are not 
al i gned 

Wnowturn to constructing the optinal prototype. 
First, we define an objective function. Gven a proto- 
type P and an obj ect rodel M ;, we define the similarity 
between P and M ; as follows, Let ~pbe a view of M,-, 
we naasure the sirmlarity bet wen the prototype P and 
the view"ifc using (12). Then, we sumthe naasure over 
all possible views of M. insuring without loss of gen- 
eralitythat || {f( =1, (14) can be rewritten as 



D{P,iJv=\\{PP^-I)^ 



(18) 



Wthout loss of generality, we can assuna that the 
constructed prototype, P, is conposed of orthonornal 
columns. N)te that an over deterrmned natrix P with 
orthonornal columns satisfies P + =P T . Wean there- 
fore rewrite (18) as 



D(P, i Jb=\\(PP r -I) l ^ 



(19) 



The distance between P and the rodel M ; is now given 
by summing D(P, s ')l>over all unit-length (to eliminate 
scaling effects) views of Mi, nanaly 



D(P, 



(PP-I)^ 



(20) 



To obtain the obj ecti ve function, we sumthese distances 
over al 1 rodel s 



e(p)=y: 



(pp-i)^ 



(21) 



The object P that minimizes this function is defined to 
be the optinal prototype. 

N)te that (21) is not the only possible objective func- 
tion for this purpose. Ai alternative "worst case" ap- 
proachis to naasure the distance betweenthe prototype 
to the farthest rodel in the class (rather than summing 
this distance over all rodels). Except for being diffiult 
to conpute, this naasure also is sensitive to "outlier" 
rodel s . 

The prototype that minimizes (21) can be constructed 
in a process that includes the following steps. 

1. T> sinplify the process we assuna the column vec- 
tors of each of the rodel natrices Mi, (1 < i < /) , 
are orthonornal . (In case they are not , we first ap- 
pl y a Q ana chrmdt process to them Sucha process 
obviously does not alter the space of views inplied 
by the rodels. ) 

2. Hi Id then xn synmatri c natrix 



F 



J2 M * M ? 



3. Find the k dormnant eigenvectors of F . The opti- 
nal natrix P is constructed fromthese eigenvec- 
tors. 



8 



N)tethat, ingeneral, we are trying to construct apro- 
totype object that w)uld belong to the given class. This 
condition deterrmnes the choice of width k for the pro- 
totype. If all the rodels share the sam width then the 
prototype w)uld assum this width In the rigid case, 
for exanple, k =4 (see Section 3.1). h mntionedin 
Section 3. 3 above, in case the objects have different de- 
grees of freedorn k is set to be the naxirnmof k i, . .;. 
where k \, . .; .ar,e fche widths of Mi, . . j r,es^cti vely. 
Incase rore than k large eigenvalues are obtained, one 
nay ignore these guideline rules and construct a proto- 
type that has higher degrees of freedomthanthe obj ects 
in the class (see for exanple [31]). 

Theorem! belowestablishes that the algorithmabove 
produces the optinal prototype. W consider here the 
case that al 1 the obj ects share si nil ar degrees of freedom 
The sam procedure can be applied with slight rodifica- 
tions to include the case of objects with different degrees 
of freedom 



F therefore is constant for any choi ce of orthonornal vec- 
tors for M, . . „,, arWsoits donimant eigenvectors rep- 
resent the best vector space for for any orthonornal rep- 
resentation of the objects. Consequently, P rmnirmzes 
the objective function regardless of choice of basis for 
the rodels, and therefore it also rmnirmzes the required 
term 



E(P)=J2[ \\(P? 

~T JWv-W =i 



J).1I 



□ 



To sunmari ze, we showed that gi ven a cl ass of obj ect 
rodels, the optinal prototype for this class is given by 
the donimant eigenvectors of the natrix i* 1 , whichis con- 
structed fromthe object rodels. N)te that in proving 
Theoreml we showed that the prototype is independent 
of choice of basis for the rodels. This inplies that, in 
order to construct the prototype, the object rodels M i, 
. . . , Mdo not need to first be brought into alignmnt. 
The process above guarantees to output the sam pro- 
totype object even if the rodels are not aligned. 



Rl e\ance to hunan vi si on 



Theorem4: Let M i, M, . . . , ;Me a set of models 

belonging to sons class C . Assume every modqli M 

representedby an n xk rmtri xvi th orthonorrml columft 

vectors. The prototype P that ninimzes the term 

The recognition by prototypes schem uses the general 
shape of objects as the cue for recognizing them h was 
(P P — 1 )f\\ already mntioned, classes inour schem containobj ects 

with fairly si nil ar shapes. In contrast, the hunan vi 



E(P)=tf 



=i 



•where the integration is 
views ~pof each model M, is 
vectors of the rmtri x 



done over al 1 
composed of 



1 en 



the uni t 
the k ei gen 



F 



--J2 M > M ? 



that correspond to i ts k largest eigenvalues. 

R*oof: let P be conposed of the k doninant eigen- 
vectors of F. A;cording to regression principles P rmn- 
irmzes the term 



I k 

EEi 

S=l .7=1 



(PP-I) 



where m 8 j is the j'th colunm vector of M;. In other 
words, consider n\j as a point in7?. n . The space spanned 
by the colunm vectors of P is the nearest k -dimnsional 
hyperplane to these points, jtt,j. The rest of this proof 
extends the claimfromthe discrete sumover the colunm 
vectors of Mi to the continuous integral over all views 
spanned by these vectors. A;cording toour assumptions, 
each natrix M ; contains an orthonornal set of colunm 
vectors . Pfepl aci ng these vectors by another orthonornal 
basis for Af will not change the natrix P ; that is, P is 
independent of the choice of orthonornal basis for the 
rodels. This is illustrated by the following derivation. 
T> obtain a newort honor nal basis for the colunmspace 
of Mi we can apply a k x k rotation natrix R to Mi 
(namly, M{R). P is the best vector space for the new 
set as well , since 

MiR(MR) T =M iRR T Mj =M d ilf =M iMj 



,1 systemrecognizes objects using both shape cues as 
I as nany other cues, such as color, texture, rotion, 
and context, and objects are categorized in their basic 
level of abstraction [33]. Chly little is currently known 
about the underlying processes for recognition used by 
the visual system Romwhat is known, in spite of the 
differences pointed above, the recognition by prototypes 
schem seem to be consistent in several key issues with 
psychological and physi ol ogi cal findings. In this section 
we bri efly revi ewthese findi ngs. 

The schem presented in this paper prorotes the no- 
tionthat categorization and identification are performd 
using sirmlar tools. In both cases view variations first 
are compensated for, and then a viewof either the hy- 
pothesized prototype or object rodel is conparedwith 
the inage. This is in contrast to rat hods (such as part 
deconpositi on and functional description) that ingen- 
eral handl e ei t her cat egor i z at i on or i dent i ficat i on, but 
do not extend to deal with both problem. The avail- 
able studies in this case are inconclusive. Sora evidence 
seemto indicate that the two processes are handled sepa- 
rately by the visual system i^gnostic and pros opagnostic 
pati ents often deronstrate degraded i denti ficati on abi 1 i - 
ties, whereas their perf or nance incategorizationrenains 
intact. Ebuble dissociation between the two processes, 
however, has not beenfound, andsothe assumption that 
the two processes are handled separately inthe brainhas 
not been established. In fact, both cells that respond 
to general faces as well as cells that respond to specific 
faces where found lying side by side within the sam 
brain area, SIS, of the nacaque ronkey [29] . The vul- 
nerability of the i denti ficati on process to brain lessions 
can be expl ainedby that the process requires a relatively 
large rarory to encode the detailed shapes of objects as 



well as sophisti catedinage processing mchanisra tore- 
cover a detailed description of the observed obj ect from 
the i rage (see e.g., [ 19] ) . 

Aiother i dea proposed here is that categorization in- 
volves tw) stages: astage of conpensating for viewvari- 
ations fol loved by a stage of 2D conparison to account 
for shape differences. A decoupling of view variation 
and senantic categorization was suggested by Iissauer 
[24]. Wrrington and Taylor [44, 45] found that pa- 
tients that suffer fromlessions in the posterior lobe of 
the right hermsphere deronstrate diffiulties in catego- 
rizing obj ects fromunconventional views, whereas their 
perfornance in categorization of objects fromconven- 
tional views renains intact. Additional evidence for the 
effect of view variations on categorization perfornance 
were found for heal thy subj ects. Subjects that are asked 
to nam objects respond slower when the objects ap- 
pear in unconventional views [28] . A so, mntal rotation 
effects, namly, response tim that grows linearly with 
the tilt of the object, were observed i n narmng tasks of 
natural objects [21]. 

Finally, the process of categorization presented here 
is achieved by conparing the inage to prototype ob- 
jects, and these prototype objects can be constructed by 
nanipulating the farmliar objects of the class. Pfecent 
studies indicate that response tim in narmng tasks is 
typically shorter and error rates are lower when the ob- 
served obj ect is sirmlar to the prototype [5]. Sirmlarly, 
shorter reactiontim is obtained when subj ects are asked 
to answer questions of the type "does the object X be- 
long to the class "Y?" [34]. Qher studies reported that 
children learn good exanples of classes before they learn 
poor ones [1, 32] and that subjects recall having seen 
the prototype or average configuration of studied face 
inages evenif this configur at i on was not studied [8] . 

To sunmarize, although the presented schem gen- 
erally does not recognize objects in their basic level of 
abstraction, it is consistent with psychological and phys- 
iological findings in several key issues including a single 
approach for the two sub- pr obi ens of recognition, cat- 
egorization and identification, vi ew dependency of the 
two sub- processes, and the role of prototypes in catego- 
rization. The findings discussed here obviously are in- 
conclusive, since psychological and physiological studies 
including the ones discussed here have rore than one 
possible interpretation. 

6 Irplenantation 

Totest the ideas presented in the paper, we have inple- 
mnted the schem and applied it to several objects. In 
our inplemntation, the library of rodels includedtwo 
classes. The first (Figure 2) contained two four- legged 
chairs (denoted by Aand B), and the second (Figure 3) 
i ncl uded t wo car rodel s , a WV&nd a Saab. 

To deronstrate categorization, we used chair Aas a 
prototype andnatchedit to an inage of chair B Corre- 
spondences between the prototype and the inage were 
picked nanually, and, using these correspondences, the 
prototype transformwas recovered and applied to the 
prototype. The results of natchingthe transformd pro- 
totype with the inage are seen in Figure 4. It can be seen 



that the transformd prototype (riddle figure) assumd 
the sam orientation as the observed obj ect (left figure), 
and that the natch bet ween the two is good considering 
that the objects have different shapes. Note that in this 
inplemntation we allowed the obj ects to undergo gen- 
eral affile trans for nations in 3D, including stretch and 
shear, and so the natch between the prototype and the 
inage was better thanif only rigid trans for nations were 
allowed. Additional exanples using chair Band the two 
cars as the prototypes are shown in Figures 5-7. 

In Figures 8-9wetriedto natch the prototypes to the 
inages withwrong correspondences. The results of these 
natches were significantly worse than when the correct 
natches were used. This is consistent with the idea dis- 
cussed in Section 3. 2 that the quality of the natch can 
be used as the obj ecti ve functionfor resolving the correct 
correspondence. 

Figure 10 shows the results of Hatching a prototype 
four-legged chair to a single-legged offie chair. It can 
be seen that the upper portions of the chairs natch rel- 
atively well, while the legs of the chairs do not find ap- 
pr opr i at e nat ches . 

Figure 11 shows the result of Hatching a prototype 
chair to an inage of a Saab car. A an anecdotal ex- 
anple, we Hatched the hole belowthe back of the chair 
to the windshield of the car and the seat to the hood. 
In general , whatever correspondence is used, the two ob- 
jects woul d natch poorly relative to Hatching the pro- 
totypes to objects of their class. 

Figures 12- 13 deronstrate the i dentificationstage. In 
the library we first aligned the rodel for chair A with 
the prototype chair (chair B) using the prototype-to- 
rodel transform Then, an inage of chair A was cate- 
gorized (Figure 5) bynatchingit tothe prototype chair, 
and the prototype transformwas conputed. In the next 
step, the prototype transformwas appliedto the specific 
rodel of chair A The result of this appli cation is seen 
in Figure 12. It can be seen that a near- perfect align- 
mnt was achieved in this process. Asirmlar process was 
applied to the VWcar in Figure 13 using the Saab car 
as the prototype. (The result of the corresponding cat- 
egorization stage is shown in Figure 6.) These figures 
deronstrate that although a perfect natch between the 
prototype and the i rage coul d not be obtai ned, the pro- 
totype transformcan still be used to align the observed 
object with its specific rodel. 

7 Sunnary 

W introduced in this paper a recognition schem that 
proceeds intwostages: categorizationandi dentification. 
Gktegori zati on i s achi eved by al i gni ng the i rage to pro- 
totype objects. lor every prototype, the nearest proto- 
type viewis recovered, and the sirmlarity between this 
view and the inage is evaluated. The prototype that 
rost resenbles the observed obj ect deterrmnes its class 
identity. Likewise, identification is achi eved by al i gn- 
ing the observed object to the individual rodels of its 
class. A this stage the prototype transformconputed 
inthe categorization stage is reused to align the rodels 
with the inage. The rodel that Hatches the observed 
object deterrmnes its specific identity. In addition, we 
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Rgure2: Ectures of tw> chairs used as rodels. We refer to these chairs by A(l eft) and B(right). Mdels for the tw> chairs 
vere constructed fomsingle inages using synmatry [31]. 





Rgure 3: Hctures of tvo cars used as rodels. left: a VWrodel. Eght: a Saab mdel. Mdels for the tw> cars vere 
borroved f rom[ 42] . 






Hgure 4: Mtcling a prototype chair (chair fy to aninage of chair B This figure, as veil as the rest of the figures, contain 
three pictures. left: the imge to be recognized MdtJe: the appearance of the prototype followng the appl i cati on of the 
prototype transform Rght: an overlay of the left and the riddle pictures. 
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Rgure 5: Mtching a prototype chair (chair r^ to aninage of chair A 
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H gure 6: Mtching a prototype car (Saab) to an imge of a W\car. 
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Hgure 7: Mtching a prototype car (Vty to an imge of a Saab car. 
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Hgure 8: Mtcfdng a prototype chair (chair r^ toaninage of chair Awth wong correspondence. 






H gure 9: Mtching a prototype car (Saab) to an imge of a W\car wth wong correspondence. 






H gur e 10: Mtchi ng a four- 1 egged chai r to an i mge of an offie chai r 
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Rgure 11: Mtcldng a prototype to a chair (chair A> to animge of a Saab car. 






Rgure 12: Mtcldng a rodel of chair A to an imge of the sana chair using the prototype transformconjuited in the 
categori zati on stage. 






Hgure 13: Mtcling a rodel of a WVcar to an imge of the sana car using the prototype transformconputed in the 
categori zati on stage. 
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presented an algorithrrf or constructing the opt inal pro- 
totypes and discussed the relevance of the schem tohu- 
nan recognition. 

Aiinportant issue conveyed by our schem is that 
categorization can be used to facilitate the identification 
of objects. Wshowedthat by first categorizing the ob- 
j ect, the difitult stages of the alignmnt process, namly, 
the recovery of the object pose and the correspondence 
betweenthe inage andthe rodel , can be performdonly 
once per cl ass. Consequently, i dentificationis reducedin 
this schem into a series of sinple tenpl ate comparisons. 

The schem presented in this paper differs fromex- 
i sting categorization schems in two inportant aspects. 
The existing schems (e.g., [4]) first attenpt to recover 
the part structure (geons) of the object fromthe inage 
alone. This structure is assumd to be alrast invari- 
ant both to rotation of the obj ect and across obj ects of 
the sam class. In contrast, our schem does not at- 
tenpt to recover any 3D infornation fromthe inage 
alone. Mreover, it separates the tw) effects that deter- 
rmne the obj ect ' s appearance: vi ewvari ati on effects and 
def ornati ons due to cl ass vari abi 1 i ty. \i ewvari ati ons are 
conpensatedfor by recovering the viewof the prototype 
that rost resenbles the inage, and the arount of de- 
f ornati on that separates the prototype fromthe specific 
object is evaluated by assessing the difference (in 2D) 
between the nearest prototype viewandthe inage. 

Open problem for future research include solving the 
correspondence between prototypes andinages, confoin- 
ingthe schem with existing indexing approaches, defin- 
ing effective masures to evaluate the qualityof natches, 
and extending the systemto incorporate additional cues, 
such as color and texture. 
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